Observing how raceproportion significance on spatial mismatch with correlation testing.
##Seattle Model 2
\[ \widehat{Spatial Mismatch} = e^{-0.122946 \cdot PC1 -2.663302} \]
This has an \[ R^{2}_{adj} = 0.16\] We used a logarthmic transformation on spatial mismatch since the linearity model had skewness to the right. Additionally, the normality probability plot suggests a slight upward trend reminsicient of a log/polynomial transformation.
Furthermore, our current \[ R^{2}_{adj} \] has increased from \[ R^{2}_{adj} = 0.13\]. Despite the smaller increase our our predictors we have decieded to go with a simpler model because of the readibility in the interaction between Principal Component 1 and Spatial Mismatch. Recall that Principal Component 1 main contributors were our original variables \(\textit{BachelorProp}\) and \(\textit{hsdiplomaProp}\). These variables contrasted each other. The variable \(\textit{hsdiplomaProp}\) has a negative weight of \(-0.43\) which suggests that as the proportion of people who have a high school diploma than typical increase Principal Component 1 will decrease. Since Princpal Component 1 has a negative weight in our overall model, we can say that that there are a higher proportion than average of people who have a high school diploma.
##
## Call:
## lm(formula = logspatialmismatch ~ PC1, data = Seattle2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1717 -0.2597 0.0263 0.2928 1.5678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.663302 0.009401 -283.30 <2e-16 ***
## PC1 -0.122946 0.005758 -21.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4678 on 2474 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1556, Adjusted R-squared: 0.1553
## F-statistic: 455.9 on 1 and 2474 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = logspatialmismatch ~ logPC1, data = Seattle2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2834 -0.2751 0.0417 0.3180 1.6089
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.84097 0.01530 -185.64 <2e-16 ***
## logPC1 -0.15240 0.01452 -10.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5208 on 1157 degrees of freedom
## (1318 observations deleted due to missingness)
## Multiple R-squared: 0.08691, Adjusted R-squared: 0.08612
## F-statistic: 110.1 on 1 and 1157 DF, p-value: < 2.2e-16
## [1] 357 904
Analysis Seattle Best Model:
Just as a reminder Principal Component 1 for Seattle heaviest weights consist of the variables \(\textit{hsdiplomaProp}\) which is the proportion of low-income people in block groups that have attained a high school education and \(\textit{bachelorProp}\) which is the proportion of low-income people in block groups that have attained a bachelors.
We see the proportion of low-income people who have attained their high school diploma in block groups is higher than average, the higher the spatial mismatch is. Especially looking at the graph when the Principal component One is between -2 and 0, spatial mismatch tends to increase.
Then as we see the proportion of low-income people who have attained their bacelor’s in block groups is higher than average, the lower the spatial mismatch is. We see that as get closer to 4, the spatial mismatch is close to 0.
Something interesting to note, is that amongst principal component 1, is that the variable \(\textit{HispanicrProp}\), \(\textit{BlackProp}\) and \(\textit{WhiteProp}\) are all relatively high as well. With $ = -0.37 $, \(\textit{BlackProp} = -0.29\) and \(\textit{WhiteProp} = 0.29\). In this case, we also see that as Hispanic proportions are higher than average in block groups, principal component 1 will decrease. Similarly, as Black proportions are higher than average in block groups principal component 1 will also decrease. Converslely as white proportions are higher than average in block groups, princpal component 1 will also increase.
We can say that in general we can say that principal component 1 primarily is seeing characteristics of education, with the heavy emphasis on high school diplomas and bachelor’s degrees.
Interpreting our results, we can see that this can be tied into education since historically Blacks and Hispanics have lower education levels compared to White and Asian populations. This can contribute to some of the spatial mismatch amongst low-income job seekers.
Exact same model as orginal, despite different transformations
#Normalizing Data
normalized_SeattleSpatial<- scale(Seattle_spatial)
#Compute Correlation Matrix
SeattleSpatialcorr_matrix <- cor(normalized_SeattleSpatial)
ggcorrplot(SeattleSpatialcorr_matrix, colors = c("#d8b365", "#f5f5f5", "#5ab4ac"))#Application
PCA_SeattleSpatial <-princomp(SeattleSpatialcorr_matrix)
summary(PCA_SeattleSpatial)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 0.9131271 0.6782956 0.30196739 0.27082069 0.22615237
## Proportion of Variance 0.5284011 0.2915676 0.05778583 0.04647987 0.03241183
## Cumulative Proportion 0.5284011 0.8199686 0.87775446 0.92423434 0.95664617
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.20036499 0.15762603 0.058471772 0
## Proportion of Variance 0.02544163 0.01574552 0.002166675 0
## Cumulative Proportion 0.98208780 0.99783332 1.000000000 1
fviz_pca_var(PCA_SeattleSpatial, col.var = "black")PCA_SeattleSpatial$loadings[, 1:2]## Comp.1 Comp.2
## hsdiplomaProp 0.30636245 0.36745555
## college1Prop 0.13603557 0.39255954
## somecollegeProp 0.08701663 0.30998611
## assProp 0.01287675 0.23809938
## bachelorProp -0.45329414 -0.33113369
## GEDProp 0.25521553 0.27477986
## p12Prop 0.46127318 -0.07529069
## WhiteProp -0.44763678 0.42115562
## POCProp 0.44287299 -0.43577113
SeattleSpatialpcs<-as.matrix(normalized_SeattleSpatial %*% PCA_SeattleSpatial$loadings[, 1:2])
SeattleSpatialpc <- prcomp(Seattle_finality3, center = TRUE, scale. = TRUE)
Seattle3 <- as.data.frame(cbind(spatialmismatch=Seattle_finality$spatialmismatch, SeattleSpatialpc$x[,1:2]))###Seattle Spatial Model 1
PC2 is not significant we are going to ignore it and focus on the relationship PC1 and spatial mismatch.
#Seattle Spatial Linear Regression with PCA
M2_PCA_SeattleSpatial <- lm(spatialmismatch ~ PC1 + PC2, data = Seattle3)
M21_PCA_SeattleSpatial <-lm(spatialmismatch ~ PC1, data=Seattle3)
summary(M21_PCA_SeattleSpatial)##
## Call:
## lm(formula = spatialmismatch ~ PC1, data = Seattle3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.082157 -0.022923 -0.005411 0.016107 0.277901
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0782017 0.0006958 112.39 <2e-16 ***
## PC1 -0.0080735 0.0004260 -18.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03463 on 2475 degrees of freedom
## Multiple R-squared: 0.1267, Adjusted R-squared: 0.1264
## F-statistic: 359.2 on 1 and 2475 DF, p-value: < 2.2e-16
Analyzing Residuals
qqnorm(M21_PCA_SeattleSpatial$resid)hist(M21_PCA_SeattleSpatial$resid)#trasnformation
Seattle3$logspatialmismatch<- log(Seattle3$spatialmismatch)
Seattle3[is.na(Seattle3)| Seattle3 == "Inf"| Seattle3 == "-Inf"] = NA
M22_PCA_SeattleSpatial<- lm(logspatialmismatch~PC1, data=Seattle3)
summary(M22_PCA_SeattleSpatial)##
## Call:
## lm(formula = logspatialmismatch ~ PC1, data = Seattle3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1717 -0.2597 0.0263 0.2928 1.5678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.663302 0.009401 -283.30 <2e-16 ***
## PC1 -0.122946 0.005758 -21.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4678 on 2474 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1556, Adjusted R-squared: 0.1553
## F-statistic: 455.9 on 1 and 2474 DF, p-value: < 2.2e-16
qqnorm(M22_PCA_SeattleSpatial$resid)summary(lm(spatialmismatch ~ PC1 + I(PC1^2), data = Seattle3))##
## Call:
## lm(formula = spatialmismatch ~ PC1 + I(PC1^2), data = Seattle3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.083368 -0.023477 -0.005039 0.015977 0.276032
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0821969 0.0009319 88.200 < 2e-16 ***
## PC1 -0.0078079 0.0004246 -18.387 < 2e-16 ***
## I(PC1^2) -0.0014974 0.0002347 -6.381 2.1e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03435 on 2474 degrees of freedom
## Multiple R-squared: 0.1409, Adjusted R-squared: 0.1402
## F-statistic: 202.8 on 2 and 2474 DF, p-value: < 2.2e-16
Baltimore_finality3 <- Baltimore_finality %>%
select(WhiteProp,AsianProp, HispanicProp, BlackProp, hsdiplomaProp,hsnodiplomaProp,somecollegeProp,college1Prop,assProp,bachelorProp)
#Normalizing Data
normalized_Baltimore<- scale(Baltimore_finality3)
#Compute Correlation Matrix
Baltimorecorr_matrix <- cor(normalized_Baltimore)
ggcorrplot(Baltimorecorr_matrix, colors = c("#d8b365", "#f5f5f5", "#5ab4ac"))#Application
PCA_Baltimore <-princomp(Baltimorecorr_matrix)
summary(PCA_Baltimore)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 0.9146061 0.4683604 0.3556335 0.28646943 0.27266128
## Proportion of Variance 0.5579587 0.1463168 0.0843605 0.05473818 0.04958847
## Cumulative Proportion 0.5579587 0.7042754 0.7886359 0.84337413 0.89296260
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 0.25022145 0.21500609 0.2130455 0.079032559 0
## Proportion of Variance 0.04176215 0.03083439 0.0302746 0.004166255 0
## Cumulative Proportion 0.93472476 0.96555914 0.9958337 1.000000000 1
PCA_Baltimore$loadings[, 1:2]## Comp.1 Comp.2
## WhiteProp 0.46412467 0.425302407
## AsianProp 0.22962986 -0.450309002
## HispanicProp 0.01412706 -0.233580886
## BlackProp -0.50592518 -0.255519570
## hsdiplomaProp -0.36562003 0.323135997
## hsnodiplomaProp -0.24195465 -0.007534153
## somecollegeProp -0.25237550 0.070823802
## college1Prop -0.07966594 0.490120134
## assProp 0.05248039 0.355653811
## bachelorProp 0.45900353 -0.142451251
fviz_pca_var(PCA_Baltimore, col.var = "black")Baltimorepcs<-as.matrix(normalized_Baltimore %*% PCA_Baltimore$loadings[, 1:2])
Baltimorepc <- prcomp(Baltimore_finality3, center = TRUE, scale. = TRUE)
Baltimore2 <- as.data.frame(cbind(spatialmismatch=Baltimore_finality$spatialmismatch, Baltimorepc$x[,1:2]))# Baltimore Fitting Linear Model PCA
M1_PCA_Baltimore <- lm(spatialmismatch ~ PC1 + PC2, data = Baltimore2)
M11_PCA_Baltimore <-lm(spatialmismatch ~ PC2, data=Baltimore2)
summary(M1_PCA_Baltimore)##
## Call:
## lm(formula = spatialmismatch ~ PC1 + PC2, data = Baltimore2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.09429 -0.03602 -0.01009 0.02535 0.23800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0860580 0.0010714 80.323 < 2e-16 ***
## PC1 -0.0001101 0.0006290 -0.175 0.861030
## PC2 -0.0032319 0.0008665 -3.730 0.000197 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04718 on 1936 degrees of freedom
## Multiple R-squared: 0.007151, Adjusted R-squared: 0.006125
## F-statistic: 6.972 on 2 and 1936 DF, p-value: 0.0009619
summary(M11_PCA_Baltimore)##
## Call:
## lm(formula = spatialmismatch ~ PC2, data = Baltimore2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.09433 -0.03617 -0.01014 0.02542 0.23814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0860580 0.0010711 80.343 < 2e-16 ***
## PC2 -0.0032319 0.0008663 -3.731 0.000196 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04717 on 1937 degrees of freedom
## Multiple R-squared: 0.007135, Adjusted R-squared: 0.006622
## F-statistic: 13.92 on 1 and 1937 DF, p-value: 0.0001963
qqnorm(M11_PCA_Baltimore$resid)hist(M11_PCA_Baltimore$resid) ### Baltimore Model 2
Too much, skew to the left.
Baltimore2$logspatialmismatch<- log(Baltimore2$spatialmismatch)
Baltimore2[is.na(Baltimore2)| Baltimore2 == "Inf"| Baltimore2 == "-Inf"] = NA
M12_PCA_Baltimore<- lm(logspatialmismatch ~ PC2, data= Baltimore2)
summary(M12_PCA_Baltimore)##
## Call:
## lm(formula = logspatialmismatch ~ PC2, data = Baltimore2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7320 -0.3870 0.0392 0.4115 1.4981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.61274 0.01386 -188.492 < 2e-16 ***
## PC2 -0.06546 0.01121 -5.839 6.13e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6104 on 1937 degrees of freedom
## Multiple R-squared: 0.0173, Adjusted R-squared: 0.01679
## F-statistic: 34.1 on 1 and 1937 DF, p-value: 6.132e-09
hist(M12_PCA_Baltimore$resid)###Baltimore Model 3
\[ \widehat{Spatial Mismatch} = ({-0.0006 \cdot PC2 + 0.282})^2 \]
This has an \[ R^{2}_{adj} = 0.01\] We used a square root transformation on spatial mismatch since the linearity model had skewness to the right and logarthmic transfomration was too much.
Furthermore, our current \[ R^{2}_{adj} =0.02\] has decreased to \[ R^{2}_{adj} = 0.01\]. but we are able to interpret this model with more simplicity. Despite the smaller increase our our predictors we have decieded to go with a simpler model because of the readability in the interaction between Principal Component 2 and Spatial Mismatch. Recall that Principal Component 2 main contributors were our original variables \(\textit{BachelorProp}\) , \(\textit{WhiteProp}\) ,and \(\textit{BlackProp}\). These variables contrasted each other. The variable \(\textit{BlackPropp}\) has a negative weight of \(-0.50\) which suggests that as the proportion of Black people than typical increase Principal Component 2 will decrease. Since Principal Component 2 has a negative weight in our overall model, we can say that that there are a higher proportion than average of people who are black. Additionally with $ = 0.46 $ and \(\textit{WhiteProp} =0.46\) as they proportions are higher than typical in a block group, Principal Component 2 increases.
This seems to suggest that eductaion and race are factors that play into spatial mismatch.
Baltimore2$sqrtspatialmismatch<- sqrt(Baltimore2$spatialmismatch)
Baltimore2[is.na(Baltimore2)| Baltimore2 == "Inf"| Baltimore2 == "-Inf"] = NA
M13_PCA_Baltimore<- lm(sqrtspatialmismatch ~ PC2, data= Baltimore2)
summary(M13_PCA_Baltimore)##
## Call:
## lm(formula = sqrtspatialmismatch ~ PC2, data = Baltimore2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.253115 -0.059134 -0.007018 0.050727 0.287631
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.282593 0.001779 158.875 < 2e-16 ***
## PC2 -0.006802 0.001439 -4.728 2.43e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07832 on 1937 degrees of freedom
## Multiple R-squared: 0.01141, Adjusted R-squared: 0.0109
## F-statistic: 22.36 on 1 and 1937 DF, p-value: 2.427e-06
hist(M13_PCA_Baltimore$resid)qqnorm(M13_PCA_Baltimore$resid)car::qqPlot(M13_PCA_Baltimore$residuals)## [1] 1706 1828
plot(M13_PCA_Baltimore$resid ~ M13_PCA_Baltimore$fitted)
abline(0,0)M14_PCA_Baltimore<-lm(spatialmismatch ~ PC2 + I(PC2^2), data = Baltimore2)Observing the data, with out best model, we see that our curve doesn’t match the data. With our low \[ R^{2}_{adj} = 0.01\] we were not expecting a great model. This does suggest that the factors of Race and education ar enot good indicators in prediciting spatial mismatch in Baltimore as much as it is in Seattle.
Moreover, the cruve line shows that there isn’t any pattern to amongt Principal Componet 2 in regards to Spatial Mismatch
plot(spatialmismatch ~ PC2, data=Baltimore2,
xlab = "Principal Component 2",
ylab = "Spatial Mismatch")
curve((coef(M13_PCA_Baltimore)[1])^2*x^2+2*coef(M13_PCA_Baltimore)[1]*coef(M13_PCA_Baltimore)[2]*x+coef(M13_PCA_Baltimore)[2]^2*x^2, lwd = 2, add =T, col= "blue")ggplot(Baltimore2, aes(x=PC2, y=spatialmismatch)) + geom_point() + geom_smooth()## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#Normalizing Data
normalized_BaltimoreSpatial<- scale(Baltimore_spatial)
#Compute Correlation Matrix
BaltimoreSpatialcorr_matrix <- cor(normalized_BaltimoreSpatial)
ggcorrplot(BaltimoreSpatialcorr_matrix, colors = c("#d8b365", "#f5f5f5", "#5ab4ac"))#Application
PCA_BaltimoreSpatial <-princomp(BaltimoreSpatialcorr_matrix)
summary(PCA_BaltimoreSpatial)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.0418731 0.4370278 0.35106606 0.28679661 0.25999160
## Proportion of Variance 0.6632595 0.1167003 0.07530633 0.05025761 0.04130213
## Cumulative Proportion 0.6632595 0.7799598 0.85526618 0.90552379 0.94682592
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.21996102 0.18602066 0.063552232 4.474720e-09
## Proportion of Variance 0.02956278 0.02114347 0.002467831 1.223448e-17
## Cumulative Proportion 0.97638870 0.99753217 1.000000000 1.000000e+00
fviz_pca_var(PCA_BaltimoreSpatial, col.var = "black")PCA_BaltimoreSpatial$loadings[, 1:2]## Comp.1 Comp.2
## hsdiplomaProp 0.30231085 0.43793235
## college1Prop 0.01234255 0.52334807
## somecollegeProp 0.17393041 -0.08761831
## assProp -0.11608247 0.26342131
## bachelorProp -0.44438150 -0.35230643
## GEDProp 0.27441782 0.22656530
## p12Prop 0.37641198 0.10080760
## WhiteProp -0.47314656 0.37119579
## POCProp 0.47581003 -0.36586084
BaltimoreSpatialpcs<-as.matrix(normalized_BaltimoreSpatial %*% PCA_BaltimoreSpatial$loadings[, 1:2])
BaltimoreSpatialpc <- prcomp(Baltimore_finality3, center = TRUE, scale. = TRUE)
Baltimore2 <- as.data.frame(cbind(spatialmismatch=Baltimore_finality$spatialmismatch, BaltimoreSpatialpc$x[,1:2]))# Baltimore Fitting Linear Model PCA
M2_PCA_BaltimoreSpatial <- lm(spatialmismatch ~ PC1 + PC2, data = Baltimore2)
M21_PCA_BaltimoreSpatial <-lm(spatialmismatch ~ PC2, data=Baltimore2)
summary(M2_PCA_BaltimoreSpatial)##
## Call:
## lm(formula = spatialmismatch ~ PC1 + PC2, data = Baltimore2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.09429 -0.03602 -0.01009 0.02535 0.23800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0860580 0.0010714 80.323 < 2e-16 ***
## PC1 -0.0001101 0.0006290 -0.175 0.861030
## PC2 -0.0032319 0.0008665 -3.730 0.000197 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04718 on 1936 degrees of freedom
## Multiple R-squared: 0.007151, Adjusted R-squared: 0.006125
## F-statistic: 6.972 on 2 and 1936 DF, p-value: 0.0009619
summary(M21_PCA_BaltimoreSpatial)##
## Call:
## lm(formula = spatialmismatch ~ PC2, data = Baltimore2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.09433 -0.03617 -0.01014 0.02542 0.23814
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0860580 0.0010711 80.343 < 2e-16 ***
## PC2 -0.0032319 0.0008663 -3.731 0.000196 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04717 on 1937 degrees of freedom
## Multiple R-squared: 0.007135, Adjusted R-squared: 0.006622
## F-statistic: 13.92 on 1 and 1937 DF, p-value: 0.0001963
Williams Test:
library(psych)##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
r.test(n = 2477, cor(Seattle_finality$spatialmismatch,Seattle_finality3$WhiteProp), cor(Baltimore_finality$spatialmismatch,Baltimore_finality$WhiteProp), n2 = 1939, twotailed =TRUE )## Correlation tests
## Call:r.test(n = 2477, r12 = cor(Seattle_finality$spatialmismatch,
## Seattle_finality3$WhiteProp), r34 = cor(Baltimore_finality$spatialmismatch,
## Baltimore_finality$WhiteProp), n2 = 1939, twotailed = TRUE)
## Test of difference between two independent correlations
## z value 5.66 with probability 0
###PCA analysis combined
final<- finality %>%
select(WhiteProp,AsianProp, HispanicProp, BlackProp, hsdiplomaProp,hsnodiplomaProp,somecollegeProp,college1Prop,assProp,bachelorProp)
#Normalizing Data
normalized_final<- scale(final)
#Compute Correlation Matrix
corr_matrix <- cor(normalized_final)
ggcorrplot(corr_matrix, colors = c("#d8b365", "#f5f5f5", "#5ab4ac"))#Application
PCA_final <-princomp(corr_matrix)
summary(PCA_final)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 0.8467891 0.5236359 0.37041998 0.29953944 0.2706628
## Proportion of Variance 0.4960415 0.1896821 0.09491969 0.06206912 0.0506786
## Cumulative Proportion 0.4960415 0.6857236 0.78064325 0.84271237 0.8933910
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 0.24631311 0.22810851 0.1894788 0.074179519 5.810361e-09
## Proportion of Variance 0.04197035 0.03599569 0.0248364 0.003806585 2.335467e-17
## Cumulative Proportion 0.93536132 0.97135701 0.9961934 1.000000000 1.000000e+00
fviz_pca_var(PCA_final, col.var = "cos2", gradient.cols = c("black", "orange", "green"),repel = TRUE)PCA_final$loadings[, 1:2]## Comp.1 Comp.2
## WhiteProp 0.41349718 0.51129567
## AsianProp 0.21954347 -0.44231382
## HispanicProp -0.09078244 -0.08873044
## BlackProp -0.47349571 -0.30313996
## hsdiplomaProp -0.42977777 0.24335990
## hsnodiplomaProp -0.24899716 -0.03599014
## somecollegeProp -0.17752024 0.21661304
## college1Prop -0.14188200 0.44535513
## assProp 0.07031107 0.32158542
## bachelorProp 0.49506060 -0.18430784
finalpcs<-as.matrix(normalized_final %*% PCA_final$loadings[, 1:2])
finalpc <- prcomp(final, center = TRUE, scale. = TRUE)
final2 <- as.data.frame(cbind(spatialmismatch=finality3$spatialmismatch, finalpc$x[,1:2]))
final3 <- as.data.frame(cbind(spatialmismatch=finality3$spatialmismatch, MSA=finality3$MSA,finalpc$x[,1:2]))
final3<-final3 %>%
mutate(City =ifelse(MSA=="Seattle",1,0))###Final regression
cor(final2$spatialmismatch,final2$PC1)## [1] 0.1690781
cor(final2$spatialmismatch,final2$PC2)## [1] -0.1128194
r.test(n = 2477, cor(final2$spatialmismatch,final2$PC1),cor(final2$spatialmismatch,final2$PC2), n2 = 1939, twotailed =TRUE )## Correlation tests
## Call:r.test(n = 2477, r12 = cor(final2$spatialmismatch, final2$PC1),
## r34 = cor(final2$spatialmismatch, final2$PC2), n2 = 1939,
## twotailed = TRUE)
## Test of difference between two independent correlations
## z value 9.36 with probability 0
combine components and then find the correlations. spatial mismatch is related to different components and then try cities
combine different cities into one PCA and then run correlations based on PC.
…